Using CCA to improve CCA: A new spectral method for estimating vector models of words
نویسندگان
چکیده
Unlabeled data is often used to learn representations which can be used to supplement baseline features in a supervised learner. For example, for text applications where the words lie in a very high dimensional space (the size of the vocabulary), one can learn a low rank “dictionary” by an eigendecomposition of the word co-occurrence matrix (e.g. using PCA or CCA). In this paper, we present a new spectral method based on CCA to learn an eigenword dictionary. Our improved procedure computes two set of CCAs, the first one between the left and right contexts of the given word and the second one between the projections resulting from this CCA and the word itself. We prove theoretically that this two-step procedure has lower sample complexity than the simple single step procedure and also illustrate the empirical efficacy of our approach and the richness of representations learned by our Two Step CCA (TSCCA) procedure on the tasks of POS tagging and sentiment classification.
منابع مشابه
Two Step CCA: A new spectral method for estimating vector models of words
Unlabeled data is often used to learn representations which can be used to supplement baseline features in a supervised learner. For example, for text applications where the words lie in a very high dimensional space (the size of the vocabulary), one can learn a low rank “dictionary” by an eigendecomposition of the word co-occurrence matrix (e.g. using PCA or CCA). In this paper, we present a n...
متن کاملA vector space model of semantics using Canonical Correlation Analysis
We present an efficient method that uses canonical correlation analysis (CCA) between words and their contexts (i.e., the neighboring words) to estimate a real-valued vector for each word that characterizes its “hidden state” or “meaning”. The use of CCA allows us to prove theorems characterizing how accurately we can estimate this hidden state. Recently developed algorithms for computing the r...
متن کاملComparison of PSDA and CCA detection methods in a SSVEP-based BCI-system
Using steady-state visually evoked potential (SSVEP) in braincomputer interface (BCI) systems is the subject of a lot of research. One of the most popular and widely used detection method is using a power spectral density analysis (PSDA). Lately there have been some new methods emerging, one of them is using canonical correlation analysis (CCA) which seems to have some promising improvements an...
متن کاملUsing semantic data to improve cross-lingual linking of article clusters
This paper presents a system that uses semantic data to improve cross–lingual linking of news article clusters. Two approaches are compared. The first based on two different Canonical Correlation Analysis (CCA) feature vector definitions: MAX–CCA and SUM–CCA, whereas the second one has been developed using a better-performed CCA approach in combination with Entity vectors. The aim of the compar...
متن کاملFacial Expression Recognition using Spectral Supervised Canonical Correlation Analysis
Feature extraction plays an important role in facial expression recognition. Canonical correlation analysis (CCA), which studies the correlation between two random vectors, is a major linear feature extraction method based on feature fusion. Recent studies have shown that facial expression images often reside on a latent nonlinear manifold. However, either CCA or its kernel version KCCA, which ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012